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Abstract. This work provides to web users copyright protection of their 
Portable Document Format (PDF) documents by proposing efficient and 
easily implementable techniques for PDF watermarking; our techniques 
are based on the ideas of our recently proposed watermarking techniques 
for software, image, and audio, expanding thus the digital objects that 
can be efficiently watermarked through the use of self-inverting per¬ 
mutations. In particular, we present various representations of a self- 
inverting permutation tt* namely ID-representation, 2D-representation, 
and RPG-representation, and show that theses representations can be 
efficiently applied to PDF watermarking. Indeed, we first present an 
audio-based technique for marking a PDF document T by exploiting 
the ID-representation of a permutation tt*, and then, since pages of a 
PDF document T are 2D objects, we present an image-based algorithm 
for encoding tt* into T by first mapping the elements of tt* into a ma¬ 
trix A* and then using the information stored in A* to mark invisibly 
specific areas of PDF document T. Finally, we describe a graph-based 
watermarking algorithm for embedding a self-inverting permutation tt* 
into the document structure of a PDF file T by exploiting the RPG- 
representation of TT* and the structure of a PDF document. We have 
evaluated the embedding and extracting algorithms by testing them on 
various and different in characteristics PDF documents. 


Keywords. Watermarking techniques; Text watermarking; PDF documents, 
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1 Introduction 

Information age has altered the way people communicate by breaking the barriers 
imposed on communications by time, distance, and location and has undoubt¬ 
edly impact not only humans activities but also global industry and economy. 
Communication has been greatly affected by the constant and rapid evolution of 
many technologies such as fiber optic, cellular and satellite technology, network¬ 
ing, digital transmission and compression as well as advanced computers, and 
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improved human-computer interaction. The aforementioned technologies allow 
the rapid transmission, and store, of great amounts of information. 

The digital era has already had extensive impacts on business, commerce, 
education, services, and social life. The concepts of e-government, e-learning, e- 
commerce, e-business, e-publishing, refer peoples’ interaction in the digital world. 
In this world, people everyday, interact by exchanging e-mails, instant messages, 
video, audio, images, and digital documents. Part of the information transmitted 
is an increasing amount of sensitive information, such as personal data, medical 
and hnancial records, business information, government data, legal documents. 
Another part of information available in the web is used to promote ones’ work 
or product. 

Electronic document, is an extensively used medium traveling over the inter¬ 
net for information exchange and due to the ease of copying and distributing they 
are susceptible to threats like illegal copying, redistribution of copyrighted doc¬ 
uments, and plagiarism. Subsequently, it has become more important to protect 
the electronic documents from any malicious user while existing in the digital 
world. Copyright protection of digital contents is such a need of time which 
cannot be overlooked. In past, various methods like encryption, steganography 
and watermarking has been used to solve these problems. However, digital wa¬ 
termarking is the better solution for copyright protection than encryption and 
steganography. It is well known that digital watermarking methods are efficient 
enough to identify the original copyright owner of the contents. 

Recall that there are many reasons why you would want to use watermarks 
in digital documents: as a copying deterrent, as a means of identifying the source 
of a printed document, as a means of determining whether a document has been 
altered, etc. 

Attacks. Any action that a user can perform on a text that can affect the 
watermark, or its usefulness, is called attack. In [32] existing attacks on text 
watermarking can be classified into three main categories: 

o watermark attacks, 
o geometric attacks, and 
o system attacks. 

In a watermark attack, the adversary aims to detect and destroy the watermark, 
without necessarily decoding the original message. In contrast to watermark at¬ 
tacks, geometrical attacks are blind attacks on watermarked text documents. 
The process of these attacks requires neither the algorithmic knowledge of the 
watermarking technique nor the watermarking key, geometrical attacks intend 
not to remove the embedded watermark itself, but to prevent it from serving its 
intended purpose through altering format or content of the watermarked text 
documents. This type of attack includes reformatting, reproducing, sentences 
swapping, paragraphs shuffling, the addition/deletion of words, sentences and 
paragraphs. System attacks use several signal processing tools such as principal 
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component analysis, independent component analysis, clustering, vector quanti¬ 
zation, etc. 

Related Work. Text watermarking is the area of research that has emerged 
after the development of Internet and communication technologies; we mention 
that the first reported effort on marking documents dates back to 1993 ^5] , 

Generally, we can classify the previous work on digital text watermarking in 
the following four categories: 

o image based approach, 
o syntactic approach, 
o semantic approach, and 
o structural approach. 

In image-based approach, a watermark is embedded in text image. Brassil, et al. 
were the first to propose a few text watermarking methods utilizing text im¬ 
age [415] : they also developed document watermarking schemes based on line 
shifts, word shifts as well as slight modifications to the characters [6]. Maxem- 
chuk, et al. [23124125] analyzed the performance of these methods, while later 
Low, et al. nani further analyzed their efficiency. Huang and Yan El proposed 
a text watermarking method based on an average inter-word distance in each 
line. 

In syntactic approach, the syntactic structure of the text is used to embed 
watermark. Atallah, et al. [3] proposed several methods of natural language wa¬ 
termarking, which opened up a brand-new and challenging research direction for 
text watermarking. Meral et al. performed morpho-syntactic alterations to the 
text to watermark it [26] : they also provided an overview of available syntactic 
tools for text watermarking [27] . 

In semantic approach, semantics of text are used to embed the watermark 
in text. Atallah et al. were the first to propose the semantic watermarking 
schemes [3]. Later, the synonym substitution method was proposed, in which 
watermark was embedded by replacing certain words with their synonyms [30] . 
Sun, et al. [29] proposed noun-verb based technique for text watermarking which 
used nouns and verbs parsed by semantic networks. Topkara, et al. proposed an 
algorithm of the text watermarking by using typos, acronyms and abbreviation 
in the text to embed the watermark [31] . Algorithms were developed to water¬ 
mark the text using the linguistic approach of presuppositions [28] in which the 
discourse structure, meaning, and representations are observed and utilized to 
embed watermark bits. The text pruning and the grafting algorithms were also 
developed in the past. Another algorithm based on text meaning representation 
(TMR) strings has also been proposed [T^ . 

The structural approach is the most recent approach used for copyright pro¬ 
tection of text documents. In this approach, text is not altered, rather it is used 
to logically embed watermark in it. A text watermarking algorithm, for copy¬ 
right protection of text using occurrences of double letters (aa-zz) in text, has 
recently been proposed [15] . Recently, a significant number of techniques have 
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been proposed in the literature which use Portable Document Format (PDF) 
files as cover media in order to hide data 171811 bl2()l2112^ . 

Contribution. In this paper, in order to provide to web users copyright protec¬ 
tion of their digital documents, we present easily implemented techniques for wa¬ 
termarking PDF documents. Our aim is to extent the digital objects that the pro¬ 
posed representations of a self-inverting permutation, i.e. the ID-representation, 
the 2D-representation, and the RPG-representation, can be efficiently applied 
to; note that, RPG-representation means the encoding of permutation tt* as a 
reducible permutation graph F’)?!*]. 

We first propose an image-based technique for marking the PDF document 
T by exploiting the ID-representation of a permutation tt*. The embedding of a 
mark is performed by increasing the distance (or, space) between two consecutive 
words in a paragraph of the document T. The extraction algorithm operates in 
a reverse manner. 

Gonsequently, since pages of a PDF documents T are two dimensional ob¬ 
jects, we propose an algorithm for encoding a self-inverting permutation tt* into 
a document T by first mapping the elements of tt* into an n* x n* matrix A* 
and then using the information stored in A* to mark invisibly specific areas of 
PDF document T resulting thus the watermarked PDF document T^. We also 
propose an efficient algorithm for extracting the embedded self-inverting permu¬ 
tation TT* from the watermarked PDF document T^, by locating the positions 
of the marks in T^,; it enables us to recontract the 2D representation of the 
self-inverting permutation tt*. 

Finally, we describe a watermarking algorithm for embedding a self-inverting 
permutation into the document structure of a PDF file T, by exploiting the graph 
representation of tt* and the structure of a PDF document T. More precisely, in 
light of the two embedding algorithms Encode_SiP.to.RPG-I and -II, we present 
an algorithm for embedding a reducible permutation graph F"[7r*] into a PDF 
document T. The main idea behind the proposed embedding algorithm is a 
systematic addition of appropriate object-references in the input PDF document 
T, through the use of entries of type \kye(-), so that the graph F[tt*] can be easily 
constructed from the page tree PT(T^,) of the resulting watermarked document 
T 

W ’ 

Road Map. The paper is organized as follows: In Section [5] we establish the 
notation and related terminology, and we present background results. In Sec¬ 
tion [3] based on the three different representations of self-inverting permuta¬ 
tion (SiP), i.e., the ID-representation, the 2D-representation, and the RPG- 
representation (the encoding of permutation tt* as a reducible permutation graph 
F[7r*]), we present the algorithms Embed_SiP.to.PDF-I, Embed_SiP.to.PDF-II, 
and Embed_RPG.to.PDF, along with the corresponding extracting algorithms, for 
embedding a watermark number (or, equivalently, a self-inverting permutation 
TT* or a reducible permutation graph F’[7r*]) into a PDF document file. Finally, 
in Section H] we conclude the paper and discuss possible future extensions. 
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2 Background Results 

In this section we give some definitions and the theoretical background we use 
towards the watermarking of Portable Document Format (PDF) documents. We 
first briefly present the different representations of a self-inverting permutation 
(SiP), and then we present the structure of PDF documents. 

ID-representation of SiP. Recently, we presented the one-dimensional repre¬ 
sentation (ID-representation) of a self-inverting permutation (SiP) tt* and the 
one-dimensional marked representation of tt* (IDM-representation), and showed 
how to embed a SiP, represented by ID space, into an audio signal mm- In our 
ID-representation, the elements of the permutation tt are mapped in specific 
cells of an array B of size as follows: 

• number —> entry — l)n + iTi) 

or, equivalently, the cell at the position (i — \)n ni is labeled by the number 
TTi, for each i = 1,2,... ,n. 

In our IDM representation, a permutation tt over the set is represented 
by an array B* by marking the cell at the position (i — l)n -I- by a specific 
symbol, where, in our implementation, the used symbol is again the asterisk 
character 

2D-representation of SiP. We have also presented the two-dimensional rep¬ 
resentation of a SiP (2D-representation) and the two-dimensional marked rep¬ 
resentation of SiP (2DM-representation); note that, theses representations have 
been recently used for watermarking images in the frequency domain mni. 

We defined the 2D-representation of a SiP as the representation where the 
elements of the permutation tt = (tti, tt 2 , ..., 7r„) are mapped in specific cells of 
an n X n matrix A as follows: 

• number —> entry A{7r~^,Tri) 

or, equivalently, 

• the cell at row i and column tt^ is labeled by the number tt^, for each i = 
1,2,...,n. 

In 2DM-representation the cell at row i and column of matrix A is marked 
by a specific symbol, for each i = 1,2,... ,n. 

We have presented algorithms for embedding the 2D-dimensional represen¬ 
tation of SiP in an image. Recall that the matrix A incorporates important 
structural properties which, in image watermarking, make it possible to detect 
geometric transformations on the watermarked image. The properties of the 
matrix A are the following: 
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The watermark mmiber ro = 4 

i 

TT* = (4,7,6,1,5,3,2) 


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 


20 21 22 23 24 25 26 27 28 29 


33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 

"*1 ^ I ^ ^1 ^ ^ n r*i ^ ^ ^ ^ I 


ID-representation of tt* 



2D-representation of tt* 



Reducible Permutation Graph -F[7r*] 


Fig. 1. Three different representations of permutation tt* = (4, 7, 6,1, 5, 3, 2). 


o the matrix A is symmetric; 

o the main diagonal of the symmetric matrix A* has always one and only one 
marked cell; 

o the marked cell on the diagonal is always in entry {i,i) of A*, where i = 
+ 2, ■ • ■, n*. 

The authors of this paper, we have also presented an efficient and easily im¬ 
plemented algorithm for encoding numbers as reducible permutation graphs 
(or, for short, RPG) through the use of self-inverting permutations |12I13] . 
In particular, we have proposed two such encoding algorithms: the algorithm 
Encode_SiP.to.RPG-I applies to any permutation tt and relies on domination re¬ 
lations on the elements of tt whereas the algorithm Encode_SiP.to.RPG-II applies 
to a self-inverting permutation tt* produced in any way and relies on the decreas¬ 
ing subsequences of tt*. Figure [T] summarizes by an example the representations 
of the permutation tt* = (4, 7,6,1, 5, 3, 2). 

2.1 Structure of a PDF Document 

The Portable Document Format (PDF) [2] is an open standard (defined in ISO 
32000) which facilitates device and platform independent capture and represen¬ 
tation of rich information such as text, multimedia and graphics, into a single 
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|%PDF-1.1 


1 0 obj 

<< /Type /Catalog /Outlines 2 0 R /Pages 3 0 R >> endobj 

2 0 obj 

<< /Type /Outlines /Count 0 >> endobj 

3 0 obj 

<< /Type /Pages /Kids [4 0 R] /Count 1 >> endobj 

4 0 obj 

<< /Type /Page /Parent 3 0 R /MediaBox [0 0 612 792] /Contents 5 
0 R /Resources << /ProcSet 6 0 R /Font << /FI 7 0 R>> » >> 
endobj 

5 0 obj 

<< /Length 48 >> 

stieam 

BT 

/FI 24 Tf 
100 700 Td 
(Hello World)Tj 
ET 

eiidstreaiii 

endobj 

6 0 obj 

[/PDF /Text] endobj 

7 0 obj 

<< /Type /Font /Subtype /Typel /Name /FI /BaseFont /Helvetica 
/Encoding /MacRomanEncoding >> endobj 


08 

0000000000 65535 f 
0000000012 00000 n 
0000000089 00000 n 
0000000145 00000 n 
0000000214 00000 n 
0000000381 00000 n 
0000000485 00000 n 
0000000518 00000 n 


trailer 

<< 

/Size 8 
/Root 1 0 R 
>> 

startxref 

642 


Header 


Body 


Cross-reference 

table 


Trailer 


(b) 


Fig. 2. (a) The structure of a PDF file; (b) The code of a PDF file containing, in object 
5 0 obj, the text “Hello World”. 


medium. Thus the PDF format enables viewing and printing of a rich docu¬ 
ment, independent of either application software or hardware. In this section we 
present a structural analysis of a PDF file, by giving its basic components. 


Object. An object is the basic element in PDF files, in which eight kinds of 
objects, namely Boolean Object, Numeric Object, String Object, Name Object, 
Array Object, Null Object, Dictionary and Stream Object are sustained. Objects 
may be labeled so that they can be referred to by other objects. A labeled object 
is called an indirect object. 







Fig. 3. (a) The main structural components of a PDF file; (b) The document structure 
of PDF file. 


File structure. The PDF file structure determines how objects are stored in a 
PDF file, how they are accessed, and how they are updated. The file structure 
(see, Figured]) includes the following: 

o an one-line header identifying the version of the PDF specification to which 
the file conforms, 

o a body containing the objects that make up the document contained in the 
file, 

o a cross-reference table containing information about the indirect objects in 
the file, and 

o a trailer giving the location of the cross-reference table and of certain special 
objects within the body of the file. 


Figured) shows an example of a PDF file and its internal file structure. 
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Document structure. The PDF document structure specifies how the basic 
object types are used to represent components of a PDF document: pages, fonts, 
annotations, and so forth. The document structure of a PDF file is organized 
in the shape of an object tree topped by Catalog, Page tree. Outline hierarchy 
and Article thread included. The Outline hierarchy is the bookmarker of PDF, 
and Page tree includes page and Pages which in turn includes the total page 
number and each page marker. Page, the main body of PDF file, is the most 
important object which involves the typeface applied, the text, pictures, page 
size, and so on. The organization of other objects is analogous to Page tree. 
Figure [3] illustrates the structure of the object hierarchy. 

3 Watermarking PDF Documents 

In this section we describe embedding algorithms for encoding a SiP tt* into 
a digital document T. More specifically, we embed the permutation tt* into a 
PDF document by exploiting (i) the one-dimensional representation of tt*, (ii) 
the two-dimensional representation of a tt*, and (iii) the encoding of tt* as a 
reducible permutation graph F'*[7r*]. 

3.1 Embed Watermark into PDF - I 

We first design an embedding algorithm for watermarking a PDF document by 
exploiting the ID-representation of a permutation tt*. The marking is performed 
by increasing the space between two consecutive words in a paragraph of T. 

Let B* be the ID array of size n = n* xn* which represents the permutation 
TT* of length n*, and let (wi, Si), (w 2 , S 2 ), ..., (w„, s„) be n pairs of type “word- 
space” of a paragraph par of the input PDF document; recall that the entry 
B*{{i — l)n* +TT*) contains the symbol , 1 < i < n*. The algorithm increases 
by a small value “c” the i-th space of the pair (w^, Si ) if B* {{i — l)n* +tt *) = “= i =”. 

We next give a high-level description, with respect to PDF modification, of 
our proposed embedding algorithm. 

Algorithm Embed_SiP.to.PDF-I 

1. Compute the IDM representation of the permutation tt* , i.e., construct the 
array B* of size n = n* x n* where the {i — l)n* + tt* entry of B* contains 
the symbol l<i<n*; 

2. Select an appropriate paragraph par on a page P of PDF document T to 
embed the self-inverting permutation tt*; 

3 . Partition the paragraph par into n pairs (wi, s 1 ), {w 2 , S 2 ),..., {wn , Sn) > where 
Wi and Si are the z-th word and space, respectively, in selected paragraph 
par, 1 < i < n; 

4. For each pair {wi, Si) s.t. B*{{i — l)n*-|-7r*) = “sk”, increases the space Si or, 
equivalently, distance d{wi,Wi+i) between words Wi and Wi+i, by a relative 
small value c, 1 < z < n; 
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Information Age has altered the way people commiaiicale by breaking the bamers imposed on 
communications by bme, distance, and location and has undoubtedly impact not only humans 
activities but also global industry and economy. Communication has been greatly aflected by the 
constant and rapid evolution of many technologies such as fiber opbc. celli£vand satellite 


Information Age has altei^^ way people commiricate by bieaking the barriers Imposed 

he 



technology, netwoituig, digital transmission and compression as well as advanced computers, 
and improved human-computer interaction. The alorementioned technologies alow the rapid 
transmission, and store, of great amounts of information 


and Improved human-compider interaction. The aforemeidioned technologies alow the rapid 
transmission, and store, of great ammints of mformabon. 

Electronic document, is an extensivety used medium traveling over the internet for siformabon 
enchange and rlue to die ease of copying and rlrstnbuting they are suscepbbie to threats ike illegal 
copyfog. redisthbufion of copyrighted documents, and ptagramm. Subsequently, it has become 
more important to protect the electronic docum^ts from any malicious user while existing In the 
digrtal world. Copyright protection of digital contents Is such a need of time which cannot be 
overlooked. In pa^, vanous methods like encryption, sleganogiaphy and watermarking has been 
used to solve these problems. However, digital watermaking is the belter solution for copyright 
protecbon than encryption and steganography. Digital watennarlung methods are efficient enough 
to Identify the origiral copynght owner of the contents. 


Electronic document, ban extensivefy used medium traveling over the internet for iiformabon 
exchange and due to die ease of copying and drstribubng they are susceptible to threats kke illegal 


copyvig, redistnbution of copyrighted documents, and plagtansm. Subsequently, it has become 
more important to protect the elecdonic documents from ary malicious user while existxig n Ihe 
digital world. Copyright protection of digital contents is such a need of time which cannot be 


overlooked. In past, vanous methods like encryption, steganography and watermarking has been 
used to solve these problems. However, digital watermarking is the beder solution for (topyiight 
proiecdcm lhan ertoryption and steganography. Digital wateimarlung methods are efficient enough 
to identify the onginal copyright owner of the contents 


(a) 


(b) 


Fig. 4. (a) The initial PDF document T; (b) The watermarked PDF document T™ using 
the ID-representation of permutation tt* = (4, 7, 6,1, 5, 3, 2); the red cycles indicate the 
marks. 


5. Return the watermarked PDF document T^. 

Extraction. The extraction algorithm, which we call Extract_PDF.from.SiP-I, 
operates as follow: it takes as input the watermarked PDF document Tw, locates 
the paragraph par, and computes the permutation tt* by finding the positions 
of the words Wi such that: 

o d{wi,Wi+i) > d{wi-i,Wi), or 
o d{wi,Wi+i) > d{wi+i,Wi+ 2 ) 

where, d{wi,Wj) is the distance between words Wi and Wj in a paragraph par of 
Tw, 1 < * < ri; note that, an appropriate paragraph par contains more that n 
words. 

3.2 Embed Watermark into PDF - II 

In this section we describe a different approach of embedding algorithm a self- 
inverting permutation tt* into a digital document T, by exploiting the two- 
dimensional representation of permutation tt* . 

The main idea behind the embedding algorithm, we call it Embed_SiP.to.PDF- 
II, is similar of that of algorithm Eiiibed_SiP.to.Iiiiage-F (see, my The most 
important of this idea is the fact that it suggests a way in which the permuta¬ 
tion TT* can be represented with a 2D-representation and since pages of a PDF 
document T are two dimensional objects that representation can be efficiently 
marked on them resulting the watermarked PDF document Tw', in a similar way 
as in our image watermarking approach, such a 2D-representation can be effi¬ 
ciently extracted for a watermarked PDF document Tw and converted back to 
the self-inverting permutation tt*. 
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(b) 


Fig. 5. (a) The initial PDF document T; (b) The watermarked PDF document T™ using 
the 2D representation of permutation tt* = (4, 7,6,1, 5, 3, 2); the red stars indicate the 
marks. 


Let A* be the 2D matrix of size n* x n* which represents the permutation 
TT* of length n*. The marking of the input PDF document T is performed by 
selecting an appropriate page P of T and setting n* objects (e.g., characters, 
symbols, images) in a specific positions on page P, 1 < f < n*. In fact, we set 
an object Oi in position with (x', j/') coordinates on page P if A*{xi, yi) = “* ”, 
where 1 < Xi,yi < n* and 0 < x',y' < size{P); note that, (0,0) is the lower-left 
point (or, equivalently, the bottom-left corner) of the page P. 

The algorithm takes as input a SiP tt* and a PDF document T, and returns 
the watermarked document P^,; it consists of the following steps. 

Algorithm Embed_SiP.to.PDF-II 

1. Compute the 2DM representation of the self-inverting permutation tt*, i.e., 
construct an array A* of size n* x n* s.t. the entry A*{i,TT*) contains the 
symbol 1 < * < n*; 

2. Select an appropriate page P to embed the permutation tt* and compute the 
size size{P) of the page P, say, N x M\ 
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Fig. 6. The watermarked DS{T^) which encodes the RPG of tt* = (4, 5, 3,1, 2). 


3. Segment the PDF page P into n* x n* grid-cells Cij of size x , 

1 < i,j < n*; 

4. For each grid-cell Cij s.t. A*{i,j) = “ * mark the cell Cij by setting a 
symbol, with an appropriate color, in any position inside Cij of P, 1 < i,j < 
n*, resulting thus the marked document 

5. Return the watermarked PDF document T^. 

Extraction. The algorithm which extracts the permutation tt* from the wa¬ 
termarked PDF operates in a similar way as the corresponding extraction 
algorithm for images: it takes the input watermarked image Iw^ locate the marked 
page P, computes its iV x M size, and segments P into n* x n* grid-cells Cij 
of size X ; then, it computes the permutation tt* by finding the co¬ 

ordinates {xi,yi) of the n* symbols in the page P, 1 < i < n*; we call it 
Extract_PDF.f rom.SiP-II. 

3.3 Embed an RPG into a PDF 

In this section we describe a watermarking algorithm for embedding a self- 
inverting permutation tt* into a PDF document T, by exploiting the RPG- 
representation of tt* and the structure of a PDF document T. 

Indeed, we have recently proposed two algorithms, namely Encode_SiP.to.RPG- 
I and -II, for encoding self-inverting permutations tt* as reducible permutation 
graphs E[7r*]. Moreover, in this paper we have described the document struc¬ 
ture DS(T) of a PDF document T (see. Subsection [2T]); note that, the document 
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structure of a PDF file always contains a node, namely Document-catalog, and 
a page tree PT(T) rooted at node Page-tree, denoted by root(pt); see. Fig¬ 
ure (SUb). 

In light of the two encoding algorithms Encode_SiP.to.RPGI-I and -II, we next 
present an algorithm for embedding a reducible permutation graph P"[7r*] into a 
PDF document T. The main idea behind the proposed embedding algorithm is 
a systematic addition of appropriate object-references in selected nodes of the 
page-tree PT(r) of the document structure DS(r), through the use of entries 
of type /Kye(-), so that the graph F'[7r*] can be easily constructed from the 
page-tree PT(Tu,) of the resulting watermarked document T^,. 

Let F[7r*] be a reducible permutation graph produced by one of our two en¬ 
coding algorithms (i.e., Encode_SiP.to.RPGI-I or -II), and let u„+i, ..., ui, uq 

be the nodes of the graph note that, F[Tr*] does not contain the back-edge 

{uo,Un+i)- In order to simplify the extraction process, the graph ^[Tr*] which is 
embedded into a PDF document T contains one extra back-edge, i.e., the edge 
{uo,Un+i); see, |12|13j . 

The algorithm for embedding a reducible permutation graph F'[7r*] into a 
PDF document T is called Encode_RPG.to.PDF and is described below. 

Algorithm Encode_RPG.to.PDF 

1. Compute the document structure DS(r) of the input PDF document T and 
locate its page-tree PT(r); let node(dc) be the document catalog node of 
structure DS(T) and root(pt) be the root node of the page tree PT(T); see. 
Figure EJb); 

2. Compute a path 0(T) = u„,..., vi,vo) on n + 2 nodes (i.e., objects) 

of the page-tree PT(r) s.t. Vn+i = root(pt), and set s = Vn+i and t = vq; 

3. Assign an exact pairing (i.e., 1-1 correspondence) of the n - I - 2 nodes of path 
0(T) to the nodes Un+i,Un, ■ ■ ■, ui, uq of the watermark graph P"[7r*]; 

4. For each back-edge {ui,Uj) of the graph F[Tr*] (i.e., uj > Ui), add the 
forward-edge {vj,Vi) in page-tree PT(T) by adding in object [vj 0 obj] an 
entry of type /Key(vi 0 R); add in object [vn+i 0 obj] an entry of type 
/Key(vo 0 R); 

5. Return the modified PDF document T, i.e., the watermarked document T^,; 

Let us briefly discuss the way we add forward-edge in the page-tree PT(T); 
recall that, in Step 4 of the previous algorithm Encode_RPG.to.PDF we add the 
forward-edge {vj,Vi) in page-tree PT(T) by adding in object [vj 0 obj] an entry 
of type /Key(vi 0 R). The entry /Key(vi 0 R) may be of various types; note that, 
/Key(-) is used as parameter in our algorithm’s description. 

In our implementation, for the forward-edge (vj,Vi) such that the object 
[vj 0 obj] is not the rood-node root(pt) of the page-tree PT(T), we always 
chose the entry /Key(vi 0 R) which we add in object [vj 0 obj] to be of the 
same type of object [vi 0 obj]. In the case where Vj = root(pt), we chose the 
entry /Key(vi 0 R) to be of type /Kids(-). 
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For example, in Figure[6]we have added forward-edges from object [29 0 obj] 
to object [3 0 obj], from object [29 0 obj] to object [24 0 obj], from object 
[3 0 obj] to object [13 0 obj], etc. Thus, in our implementation we have added 
in the root-node object [29 0 obj] the entries /Kids(3 0 R) and /Kids(24 0 R), 
in object [3 0 obj] the entry /X0bject(13 0 R), while in object [13 0 obj] the 
entries /ColorSpace(6 0 R) and /R9(5 0 R). 

Remark 3.1. Let T be a PDF file and let PT(r) be a page-tree of the document 
structure DS(r). A node of the page-tree PT(r) may contain several entries 
/Key(-) of various types. We mention that, some types are required for the entries 
in specific nodes of PT(T); for example, the required entries in the root-node 
root(pt) of the page-tree PT(T) are the following four: /Type(-), /Parent(-), 
/Kids(-), and /Count(-). 

Extraction. We next describe the corresponding extraction algorithm, which we 
call Extract_RPG.from.PDF; it extracts the graph E[7r*] from the PDF document 
Tjjj watermarked by the embedding algorithm Encode_RPG.to.PDF. The algorithm 
works as follows: 

• Take first as input the PDF document watermarked by the embedding 
algorithm Encode_RPG.to.PDF, compute the document structure DS(Tu,) of 
Tw, and locate its page tree PT(ru,); then, find in object root{pt), where 
root{pt) is the root of the tree PT(Tu,), the entry /Kids(vij 0 R) s.t. Vk is not 
a child of root{pt), and set w„+i = root{pt) and vo = Vk] 

• Compute the path 0{T) = (vn+i, u„, ... ,vi, vg) ofPT(T^a), from node root(pt) 
to vq, and assign an exact pairing (i.e., 1-1 correspondence) of the n-|-2 nodes 
of path 0(T) to the nodes Un+i, Un, ■ ■ ■, ui, uq of a graph E[7r*]; initially, 
E(F[7t*]) = 0; 

• Add edges (ui+i, Ui) in E[7r*] for i = n, n — 1,..., 0, and the edge (ui, uj) iff 
{vi,Vj) is a forward edge in the page tree PT(Tu,); 

• Delete the edge (u„+i,uo) from the graph E[7r*]; 

• Return the graph F[Tr*]; 

It is easy to see that, by construction the returned graph E[7r*] is a reducible 
permutation graph produced by either algorithm Encode_SiP.to.RPG-I or algo¬ 
rithm Encode_SiP.to.RPG-II. Thus, E[7r*] has the following property: the struc¬ 
ture which results after deleting 

(i) all the forward edges of E[7r*], 0 < i < n, and 

(ii) the node ug 

is either the tree Tdlir*] or the tree Ts]'?''*] produced during the execution of either 
the decoding algorithm Decode_RPG.to.SiP-I or algorithm Decode_RPG.to.SiP- 
II, respectively (see, Thus, we can efficiently extract the self-inverting 

permutation tt* embedded into a PDF document T by algorithm Encode_RPG.to.PDF. 
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4 Concluding Remarks 

In this paper we presented embedded algorithms, along with their corresponding 
extraction algorithms, for watermarking PDF documents T using three differ¬ 
ent representations of a self-inverting permutation tt*, namely ID-representation, 
2D-representation, and RPG-representation; note that, RPG-representation means 
the encoding of permutation tt* as a reducible permutation graph F* [tt*] . 

The main features of our algorithms, i.e., the way they mark a PDF document 
T or, equivalently, the way they embed a self-inverting permutation tt* into 
document T, are summarized as follows: 

o In the first algorithm Embed_SiP.to.PDF-I the marking of a PDF document T 
is performed by increasing the distance (or, space) between two consecutive 
words in a paragraph of T. 

o The main idea behind the second algorithm Embed_SiP.to.PDF-II is based 
on the fact that tt* has a 2D-representation and, since pages of a PDF doc¬ 
uments T are two dimensional objects, it can be efficiently used to mark 
specific positions on a page of T resulting thus the watermarked PDF doc¬ 
ument Tw 

o The third graph-based embedding algorithm Encode_RPG.to.PDF uses a def¬ 
erent approach: it exploits the structure of a PDF document T and embeds 
the graph F[7r*] into T by adding appropriate object-references in the doc¬ 
ument T, through the use of entries of type /Kids(k 0 R), so that the graph 
F[7r*] can be easily constructed from the page tree PT(Tu,) of the resulting 
watermarked document 

In light of our graph-based embedding algorithm Encode_RPG.to.PDF it would be 
very interesting to investigate the possibility of altering other components of the 
document structure of a PDF file in order to embed the graph -F[7r*]; we leave 
it as a direction for future work. 

Moreover, an interesting open question is whether the embedding approaches 
and techniques used in this paper can help develop efficient encoding algorithms 
having “better” properties with respect text attacks; we leave it as an open 
problem for future investigation. 
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